Skip to content

DM-54879: Support reprocessing when upstream outputs are selectively retained#561

Open
hsinfang wants to merge 6 commits into
mainfrom
tickets/DM-54879
Open

DM-54879: Support reprocessing when upstream outputs are selectively retained#561
hsinfang wants to merge 6 commits into
mainfrom
tickets/DM-54879

Conversation

@hsinfang

@hsinfang hsinfang commented May 12, 2026

Copy link
Copy Markdown
Contributor

Checklist

  • ran Jenkins
  • ran and inspected package-docs build
  • added a release note for user-visible changes to doc/changes

@hsinfang hsinfang force-pushed the tickets/DM-54879 branch from d4f69bd to 4bf23e3 Compare May 12, 2026 22:31
@codecov

codecov Bot commented May 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.74775% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.81%. Comparing base (85e053f) to head (abb7cc9).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
python/lsst/pipe/base/quantum_graph_builder.py 93.42% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #561      +/-   ##
==========================================
+ Coverage   88.79%   88.81%   +0.02%     
==========================================
  Files         160      160              
  Lines       22120    22326     +206     
  Branches     2625     2656      +31     
==========================================
+ Hits        19641    19829     +188     
- Misses       1843     1853      +10     
- Partials      636      644       +8     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@hsinfang hsinfang force-pushed the tickets/DM-54879 branch from 4bf23e3 to d6ef7aa Compare May 12, 2026 22:45
return zstandard.ZstdCompressionDict(b"")
self.comms.log.info("Training compression dictionary.")
training_inputs: list[bytes] = []
training_inputs: list[bytes | bytearray | memoryview[int]] = []

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious where this is coming from; AFAIK we don't use bytearray or memoryview[int] for any of these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it from mypy's suggestion directly, but this has now been fixed by ac0bcb1

Comment thread tests/test_graphBuilder.py Outdated
Comment thread python/lsst/pipe/base/quantum_graph_builder.py Outdated
Comment thread python/lsst/pipe/base/quantum_graph_builder.py Outdated
Comment thread python/lsst/pipe/base/quantum_graph_builder.py Outdated
@hsinfang hsinfang force-pushed the tickets/DM-54879 branch 4 times, most recently from 3e4cff4 to 52abd2a Compare June 3, 2026 18:23
@hsinfang hsinfang force-pushed the tickets/DM-54879 branch 3 times, most recently from 8237d96 to 4bdec93 Compare June 8, 2026 22:27
@timj

timj commented Jun 9, 2026

Copy link
Copy Markdown
Member

@hsinfang are you imminently merging this or is it going to be a few days? I am trying to sync with the v30 release so wondered what your plan was.

@hsinfang

hsinfang commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@timj this won't be imminent, and can wait longer too if that makes other things easier.

@hsinfang hsinfang force-pushed the tickets/DM-54879 branch 5 times, most recently from ed486d7 to 0fad777 Compare June 16, 2026 17:52
hsinfang added 3 commits June 16, 2026 10:54
The skip_existing_in behavior of QuantumGraphBuilder was previously
only covered through test_separable_pipeline_executor.py, where
SeparablePipelineExecutor drives AllDimensionsQuantumGraphBuilder.
No tests exercised the builder directly at the unit level.
Extract the read-only metadata check and the skeleton mutation in
_skip_quantum_if_metadata_exists into two helpers
_compute_skip_decision and _apply_skip_decision.
No behavior change.
@hsinfang hsinfang changed the title DM-54879: Add --ignore-existing-metadata-for for when upstream outputs are selectively retained DM-54879: Support reprocessing when upstream outputs are selectively retained Jun 16, 2026
hsinfang added 3 commits June 16, 2026 11:48
…quanta

Daytime AP runs against data produced by Prompt Processing, which does
not retain all intermediate outputs.  With --skip-existing-in, tasks
whose metadata exists are skipped even when their outputs are absent.
When a downstream task needs to run, it may not see some inputs and
is dropped as no work found.

retained_dataset_types provides dataset types expected to exist in
skip_existing_in. The non-retained types trigger backward unskipping
of the ancestor quanta needed to regenerate them.
SeparablePipelineExecutor is not used by pipetask, but we might
as well extend the same option and get tested there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants